adaptive scheme
High-dimensional Adaptive MCMC with Reduced Computational Complexity
Hird, Max, Livingstone, Samuel
We propose an adaptive MCMC method that learns a linear preconditioner which is dense in its off-diagonal elements but sparse in its parametrisation. Due to this sparsity, we achieve a per-iteration computational complexity of $O(m^2d)$ for a user-determined parameter $m$, compared with the $O(d^2)$ complexity of existing adaptive strategies that can capture correlation information from the target. Diagonal preconditioning has an $O(d)$ per-iteration complexity, but is known to fail in the case that the target distribution is highly correlated, see \citet[Section 3.5]{hird2025a}. Our preconditioner is constructed using eigeninformation from the target covariance which we infer using online principal components analysis on the MCMC chain. It is composed of a diagonal matrix and a product of carefully chosen reflection matrices. On various numerical tests we show that it outperforms diagonal preconditioning in terms of absolute performance, and that it outperforms traditional dense preconditioning and multiple diagonal plus low-rank alternatives in terms of time-normalised performance.
ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization
Alternating direction method of multipliers (ADMM) has received tremendous interest for solving numerous problems in machine learning, statistics and signal processing. However, it is known that the performance of ADMM and many of its variants is very sensitive to the penalty parameter of a quadratic penalty applied to the equality constraints. Although several approaches have been proposed for dynamically changing this parameter during the course of optimization, they do not yield theoretical improvement in the convergence rate and are not directly applicable to stochastic ADMM. In this paper, we develop a new ADMM and its linearized variant with a new adaptive scheme to update the penalty parameter. Our methods can be applied under both deterministic and stochastic optimization settings for structured non-smooth objective function. The novelty of the proposed scheme lies at that it is adaptive to a local sharpness property of the objective function, which marks the key difference from previous adaptive scheme that adjusts the penalty parameter per-iteration based on certain conditions on iterates. On theoretical side, given the local sharpness characterized by an exponent $\theta\in(0, 1]$, we show that the proposed ADMM enjoys an improved iteration complexity of $\widetilde O(1/\epsilon^{1-\theta})$\footnote{$\widetilde O()$ suppresses a logarithmic factor.} in the deterministic setting and an iteration complexity of $\widetilde O(1/\epsilon^{2(1-\theta)})$ in the stochastic setting without smoothness and strong convexity assumptions. The complexity in either setting improves that of the standard ADMM which only uses a fixed penalty parameter. On the practical side, we demonstrate that the proposed algorithms converge comparably to, if not much faster than, ADMM with a fine-tuned fixed penalty parameter.
Achieving budget-optimality with adaptive schemes in crowdsourcing
Adaptive schemes, where tasks are assigned based on the data collected thus far, are widely used in practical crowdsourcing systems to efficiently allocate the budget. However, existing theoretical analyses of crowdsourcing systems suggest that the gain of adaptive task assignments is minimal. To bridge this gap, we investigate this question under a strictly more general probabilistic model, which has been recently introduced to model practical crowdsourcing data sets. Under this generalized Dawid-Skene model, we characterize the fundamental trade-off between budget and accuracy, and introduce a novel adaptive scheme that matches this fundamental limit. We further quantify the gain of adaptivity, by comparing the trade-off with the one for non-adaptive schemes, and confirm that the gain is significant and can be made arbitrarily large depending on the distribution of the difficulty level of the tasks at hand.
ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization
Alternating direction method of multipliers (ADMM) has received tremendous interest for solving numerous problems in machine learning, statistics and signal processing. However, it is known that the performance of ADMM and many of its variants is very sensitive to the penalty parameter of a quadratic penalty applied to the equality constraints. Although several approaches have been proposed for dynamically changing this parameter during the course of optimization, they do not yield theoretical improvement in the convergence rate and are not directly applicable to stochastic ADMM. In this paper, we develop a new ADMM and its linearized variant with a new adaptive scheme to update the penalty parameter. Our methods can be applied under both deterministic and stochastic optimization settings for structured non-smooth objective function. The novelty of the proposed scheme lies at that it is adaptive to a local sharpness property of the objective function, which marks the key difference from previous adaptive scheme that adjusts the penalty parameter per-iteration based on certain conditions on iterates. On theoretical side, given the local sharpness characterized by an exponent $\theta\in(0, 1]$, we show that the proposed ADMM enjoys an improved iteration complexity of $\widetilde O(1/\epsilon^{1-\theta})$\footnote{$\widetilde O()$ suppresses a logarithmic factor.} in the deterministic setting and an iteration complexity of $\widetilde O(1/\epsilon^{2(1-\theta)})$ in the stochastic setting without smoothness and strong convexity assumptions. The complexity in either setting improves that of the standard ADMM which only uses a fixed penalty parameter. On the practical side, we demonstrate that the proposed algorithms converge comparably to, if not much faster than, ADMM with a fine-tuned fixed penalty parameter.
Achieving budget-optimality with adaptive schemes in crowdsourcing
Adaptive schemes, where tasks are assigned based on the data collected thus far, are widely used in practical crowdsourcing systems to efficiently allocate the budget. However, existing theoretical analyses of crowdsourcing systems suggest that the gain of adaptive task assignments is minimal. To bridge this gap, we investigate this question under a strictly more general probabilistic model, which has been recently introduced to model practical crowdsourcing data sets. Under this generalized Dawid-Skene model, we characterize the fundamental trade-off between budget and accuracy, and introduce a novel adaptive scheme that matches this fundamental limit. We further quantify the gain of adaptivity, by comparing the trade-off with the one for non-adaptive schemes, and confirm that the gain is significant and can be made arbitrarily large depending on the distribution of the difficulty level of the tasks at hand.
Reviewer # 1
DQN paper that we applied to all the baselines and our method for the experiment. Therefore, we are certain that we have provided a fair comparison. We apologize for the source of confusion about the update period in Appendix D. We meant "At each " We will correct this to prevent confusion. There have been many recent related works including the ones the Reviewer 1 cited. Unfortunately, we could not provide detailed differences/advances of them all in the limited amount of manuscript.
Adaptive Neural Quantum States: A Recurrent Neural Network Perspective
McNaughton, Jake, Hibat-Allah, Mohamed
Neural-network quantum states (NQS) are powerful neural-network ansätzes that have emerged as promising tools for studying quantum many-body physics through the lens of the variational principle. These architectures are known to be systematically improvable by increasing the number of parameters. Here we demonstrate an Adaptive scheme to optimize NQSs, through the example of recurrent neural networks (RNN), using a fraction of the computation cost while reducing training fluctuations and improving the quality of variational calculations targeting ground states of prototypical models in one- and two-spatial dimensions. This Adaptive technique reduces the computational cost through training small RNNs and reusing them to initialize larger RNNs. This work opens up the possibility for optimizing graphical processing unit (GPU) resources deployed in large-scale NQS simulations.
Reviews: Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization
The paper tightens the analysis of local SGD that periodically averages the models at different nodes. It improves the bound on the communication rounds suffice to achieve linear speedups and relaxes some of the assumptions used in previous works. The authors also develop an adaptive scheme to choose the communication rounds based on the intuition from their theoretical results. They also provided empirical results on logistic regression problem with epsilon dataset to support the theoretical results. Comments: The paper is well written and east to follow.
Reviews: Gradient Dynamics of Shallow Univariate ReLU Networks
In this paper, the authors are using a neural network with one hidden layer and one output layer. The activation function is ReLU, and each hidden unit also has a bias term that can be trained. The input is 1-dimensional, i.e., a scalar, so the task is equivalent to interpolation. The loss function is square loss, and the authors use gradient descent to learn the network weights. The authors use an over-parametrization scheme where the number of nodes in the hidden layer tends to infinity, which means the network is infinitely wide. There are two main learning schemes mentioned in this paper: Kernel and adaptive.
Reviews: An Empirical Study on The Properties of Random Bases for Kernel Methods
Summary: The authors provided an empirical study contrasting neural networks and kernel methods, with a focus on how random and adaptive schemes would make efficient use of features in order to improve quality of learning, at four levels of abstraction: data-agnostic random basis (baseline kernel machines with traditional random features), unsupervised data-adaptive basis for better approximation of kernel function, supervised data-label-adaptive basis by kernel target alignment, discriminatively adaptive basis (neural nets). The paper concluded with several suggestions and caveats for efficient use of random features in practice. Comments: - 1 - Line 123, especially for sake of comparing UAB case where the underlying assumption is that using the true kernel function k in prediction yields the "best" performance so that UAB tries to approximate it, I would suggest testing in experiments a baseline model that utilizes the true kernel function k in prediction. Also this would suggest, for example in Figure 1 at which point of the KAE curve the accuracy is sufficiently good (despite many theoretical results available). However, in order to support those conclusions to a definitively convincing extent, more datasets should be needed. For example, the performance scores in Tab. 1 do not seem to be too significantly different marginally for each task.